GPUs present the opportunity to execute highly parallel processes an order of magnitude faster than traditional CPU bound solutions. While GPUs are already widely used in deep learning applications, more focus is being given to their use in traditional data processing when large amounts of data are involved. This article demonstrates how GPUs may be used to parallelise road network routing using the cugraph library, speeding up analysis times exponentially over the commonly used networkx python library.
To test our routing, we will be considering the drive-time distance for a collection of postcodes to their nearest GP practice1. First some functions and classes are loaded in from our base library named ahah, cudf provides GPU accelerated data frames, and pandas provides normal CPU bound data frames.
The road network data and postcodes have already been pre-processed so may be directly loaded into cudf GPU-accelerated data frames through the get_data function. This function takes in a single postcode, a list, or postcodes matched by a regular expression.
The Routing and CPURouting classes iterate over each POI, linked with their nearest road node, calculating the road distance from each postcode node within a 50km buffer, weighted by the estimated road drive-times. The following ‘Setup’ chunk contains two helper functions to take in the data, process this routing, and output the distances and time-taken.
For our analysis we have selected a subset of postcodes for the High Peak Borough, containing 2,861 total postcodes, with an estimated population of around 90,000 people. The following code reads in a table of postcodes within this borough into a python list, which is then processed used to subset the overall postcode data.
print(f"CPU Routing complete in {cpu_time:.2f} seconds.")print(f"GPU Routing complete in {gpu_time:.2f} seconds.")print(f"Speedup: {cpu_time / gpu_time:.2f}x!")
CPU Routing complete in 71.41 seconds.
GPU Routing complete in 4.60 seconds.
Speedup: 15.51x!
It is clear that for this analysis, the total processing time is much slower when CPU bound over using the cugraph GPU library. This difference also extends when applied to larger datasets, as the GPU processing scales exponentially (within memory limits).
Figure 1 gives an overview of the output from this analysis. There is a clear correlation between proximity to GP practice and drive-time accessibility, as expected. Due to the mixture of rural and urban settings within the chosen area, there is a relatively large disparity in accessibility between postcodes, while urban centred, dense postcodes have times primarily below 10 minutes, there are a select few remote postcodes with times exceeding 40 minutes. Notably, their relative proximity to GP practices does not account for this large difference, meaning in these remote areas, the road network is responsible for lower accessibility.
Figure 1: Distance from postcodes to nearest GP practices (red) within High Peak
For peace of mind it is worth considering whether CPU or GPU processing gets different results. The mean difference appears to be around 1.2 seconds (0.02 minutes), with a maximum disagreement of 1.55 minutes.